# 43 Orthogonal projection and best approximation theorem.

Date modified: Thu 2024-05-23 . 01:13 PM
Date created: Mon 2024-05-20 . 08:00 AM
# 43 Orthogonal projection and best approximation theorem. 🚧🚧(need to add figures)🚧🚧 ## Orthogonal decomposition and orthogonal projection. We now turn to the following problem: Fix a subspace $W$ of $\mathbb R^{n}$, for any vector $x \in \mathbb R^{n}$, can we decompose $x$ into a sum of two parts $x = x_{||}+ x_{\perp}$, where $x_{||} \in W$ and $x_{\perp} \in W^{\perp}$? The expression $x = x_{||}+x_{\perp}$ is the _orthogonal decomposition_ of $x$ with respect to the subspace $W$, and we call $x_{||}$ the _orthogonal projection_ of $x$ onto $W$, and $x_{\perp}$ _the component $x$ orthogonal to $W$_. The answer is yes, such orthogonal decomposition exists, and we will construct this orthogonal decomposition, provided we believe $W$ has an orthogonal basis. Let us assume that $u_{1},\ldots ,u_{k}$ is an orthogonal basis for $W$. Then since $x_{||}$ is desired to be in $W$, we have some linear combination $$ x_{||} = c_{1}u_{1}+\cdots +c_{k}u_{k} $$But what are these coefficients $c_{i}$? Since $x_{\perp} = x-x_{||}$ is desired to be in $W^{\perp}$, we must have $x_{\perp}$ to be orthogonal to each of the $u_{i}$ basis vectors of $W$. So for each $i$, we have $$ u_{i}\cdot x_{\perp} = u_{i} \cdot (x-x_{||}) = 0 $$This implies that for each $i$ we have $$ \begin{array}{} u_{i}\cdot x = u_{i} \cdot x_{||} =u_{i}(c_{1}u_{1}+\cdots c_{k}u_{k}) = c_{i}u_{i}\cdot u_{i} \\ \displaystyle \implies c_{i} = \frac{u_{i}\cdot x}{u_{i}\cdot u_{i}} \end{array} $$which gives us the coefficients $c_{i}$ ! And to verify $x_{\perp} = x - x_{||}$ is indeed in $W^{\perp}$, for each basis vector $u_{i}$ of $W$, we compute the dot product $$ \begin{align*} u_{i} \cdot x_{\perp} &= u_{i}\cdot (x-x_{||}) \\ &= u_{i} \cdot x - u_{i} \cdot x_{||} \\ &= u_{i}\cdot x - u_{i}\cdot(c_{1}u_{1}+\cdots +c_{k}u_{k}) \\ &= u_{i}\cdot x - c_{i}(u_{i}\cdot u_{i}) \\ &= u_{i}\cdot x - \frac{u_{i}\cdot x}{u_{i}\cdot u_{i}}(u_{i}\cdot u_{i}) \\ &= 0. \end{align*} $$Hence indeed, $x_{\perp}$ is in $W^{\perp}$. So we have > **Orthogonal decomposition with respect to a subspace $W$.** > If $u_{1},\ldots,u_{k}$ is an orthogonal basis of $W$, then the orthogonal decomposition of $x$ respect to $W$ is $$ \begin{array}{l} x = x_{||} + x_{\perp} \quad\text{with }\ \ x_{||} \in W\ \ \ ,\ \ \ x_{\perp} \in W^{\perp} \\ \displaystyle \text{where }\ \ x_{||} =\left( \frac{u_{i}\cdot x}{u_{1}\cdot u_{1}} \right) u_{1} +\cdots \left( \frac{u_{k}\cdot x}{u_{k}\cdot u_{k}} \right) u_{k} \end{array} $$We say $x_{||}$ is the _orthogonal projection of $x$ onto $W$_, and $x_{\perp}$ the _component of $x$ orthogonal to $W$_. Now it may seem like the orthogonal decomposition of $x = x_{||}+x_{\perp}$ with respect to $W$ depends on the the choice of the orthogonal basis for $W$, but it actually does not. This decomposition is unique: > **Uniqueness of orthogonal decomposition with respect to a subspace $W$.** > Let $W$ be a subspace of $\mathbb R^{n}$. If the orthogonal decomposition of $x$ with respect to $W$ exists, that $x= x_{||}+x_{\perp}$ where $x_{||} \in W$ and $x_{\perp}\in W^{\perp}$, then it is unique. $\blacktriangleright$ Proof. Suppose one can orthogonally decompose a vector $x$ with respect to subspace $W$ in two ways, $$ x = x_{||}+x_{\perp} \quad\text{and}\quad x=x_{||}'+ x_{\perp}' $$where $x_{||},x_{||}' \in W$ and $x_{\perp},x_{\perp}' \in W^{\perp}$ . Then we have equality $$ \begin{align*} x_{||}+x_{\perp} = x_{||}'+x_{\perp}' \\ \implies \underbrace{x_{||}-x_{||}'}_{\in W} = \underbrace{x_{\perp}'-x_{\perp}}_{\in W^{\perp}} \end{align*} $$Note the quantity $x_{||}-x_{||}'$ is in $W$ but equals to an element in $W^{\perp}$. This means $$ x_{||}-x_{||}' \in W\cap W^{\perp} $$But the intersection $W\cap W^{\perp} = \{\vec0\}$, so we have $x_{||}-x_{||}'=\vec 0$, namely $x_{||}=x_{||}'$, and hence also $x_{\perp}=x_{\perp}'$. In other words, the orthogonal decomposition $x=x_{||}+x_{\perp}$ with respect to $W$, where $x_{||}\in W$ and $x_{\perp}\in W^{\perp}$ is unique. $\blacksquare$ Let us write the notation $$ \text{proj}_{u}(x) = \left( \frac{u \cdot x}{u \cdot u} \right) u $$to be the orthogonal projection of $x$ onto the line spanned by $u$. And let us also write the notation $$ \text{proj}_{W} (x) = x_{||} $$Then we can restate the orthogonal projection of $x$ onto $W$ as follows: > Let $W$ be a subspace with an orthogonal basis $u_{1},u_{2},\ldots,u_{k}$, then the orthogonal projection of $x$ onto $W$ is $$ \text{proj}_{W}(x) = \text{proj}_{u_{1}}(x)+ \text{proj}_{u_{2}}(x) + \cdots + \text{proj}_{u_{k}}(x) $$where $$ \text{proj}_{u}(x) = \left( \frac{u \cdot x}{u \cdot u} \right) u. $$That is, the orthogonal projection of a vector $x$ onto a subspace $W$ is the sum of orthogonal projections of $x$ onto each basis vector of an orthogonal basis of $W$. **Remark.** It is important that $u_{1},u_{2},\ldots,u_{k}$ is an orthogonal basis of $W$ when using above formula to compute the orthogonal projection. Otherwise you would have gotten something incorrect! And using a process called _Gram-Schmidt process_, we can turn any basis of $W$ into an orthogonal basis, which we will see later. Hence, orthogonal basis for a subspace $W$ in $\mathbb R^{n}$ exists, and therefore one can always perform orthogonal decomposition of a vector with respect to $W$. **Example.** Consider $W =\text{span}\{\begin{pmatrix}1\\1\\2\end{pmatrix},\begin{pmatrix}-3\\1\\1\end{pmatrix}\}$. Let $x = \begin{pmatrix}1\\1\\1\end{pmatrix}$, and find $\text{proj}_{W}(x)$, the orthogonal projection of $x$ onto $W$, also compute $x_{\perp}$, the component of $x$ orthogonal to $W$. Check that $x_{\perp}$ and $\text{proj}_{W}(x)$ are actually orthogonal to each other. $\blacktriangleright$ First we check if we have an orthogonal basis for $W$, this is important. Indeed $$ u_{1} = \begin{pmatrix}1\\1\\2\end{pmatrix},u_{2}=\begin{pmatrix}-3\\1\\1\end{pmatrix} $$is an orthogonal basis set for $W$. So we can directly compute $$ \begin{align*} \text{proj}_{W}(x)& =\text{proj}_{u_{1}} x + \text{proj}_{u_{2}}(x) \\ &=\frac{u_{1} \cdot x}{u_{1}\cdot u_{1}} u_{1}+ \frac{u_{2}\cdot x}{u_{2}\cdot u_{2}}u_{2} \\ &= \frac{4}{6} \begin{pmatrix}1\\1\\2\end{pmatrix} + \frac{-1}{11} \begin{pmatrix}-3\\1\\1\end{pmatrix} \\ \implies \text{proj}_{W}(x)& = \begin{pmatrix}31 / 33 \\ 19 / 33 \\ 41 / 33\end{pmatrix}. \end{align*} $$ And $x_{\perp}$, the component of $x$ orthogonal to $W$ is just $x -\text{proj}_{W}(x)$, namely $$ x_{\perp} = x - \text{proj}_{W}(x) = \begin{pmatrix}1\\1\\1\end{pmatrix}-\begin{pmatrix}31 / 33 \\ 19 / 33 \\ 41 / 33\end{pmatrix}= \begin{pmatrix}2 / 33 \\ 14 / 33 \\ -8 / 33\end{pmatrix}. $$ And indeed, if we compute $$ \text{proj}_{W}(x) \cdot x = \begin{pmatrix}31 / 33 \\ 19 / 33 \\ 41 / 33\end{pmatrix} \cdot \begin{pmatrix}2 / 33 \\ 14 / 33 \\ -8 / 33\end{pmatrix} = \frac{62+ 266-328}{(33)^{2}} = 0 $$so $\text{proj}_{W}(x)$ is orthogonal to $x_{\perp}$ as expected. $\blacklozenge$ **Example.** (A simple case of finding an orthogonal basis for a 2 dimensional subspace.) Let $W =\text{span}\{\begin{pmatrix}1\\1\\1\end{pmatrix},\begin{pmatrix}1\\2\\3\end{pmatrix} \}$, and a vector $x = \begin{pmatrix}3\\2\\1\end{pmatrix}$. (a) Find an orthogonal basis for $W$. (b) Compute $\text{proj}_{W}(x)$. $\blacktriangleright$ (a) Note the two vectors $w_{1} = \begin{pmatrix}1\\1\\1\end{pmatrix},w_{2}=\begin{pmatrix}1\\2\\3\end{pmatrix}$ that span $W$ are not orthogonal to each other. But one can make a new basis $u_{1},u_{2}$ for $W$ that is orthogonal. First let us fix $u_{1} = w_{1}$. Now, $u_{1}$ is not orthogonal to $w_{2}$. But what we can do is by orthogonally project $w_{2}$ onto $u_{1}$, and subtract that component to get a vector orthogonal to $u_{1}$! That is, take $$ u_{2} = w_{2} - \text{proj}_{u_{1}}w_{2} $$In this case, $u_{1}$ and $u_{2}$ are now orthogonal and still spans $W$! So, in our case, $$ \begin{align*} u_{1} & = w_{1} = \begin{pmatrix}1\\1\\1\end{pmatrix}\\ u_{2} &= w_{2} - \text{proj}_{u_{1}}(w_{2}) = \begin{pmatrix}1\\2\\3\end{pmatrix} - \frac{\begin{pmatrix}1\\1\\1\end{pmatrix}\cdot\begin{pmatrix}1\\2\\3\end{pmatrix}}{\begin{pmatrix}1\\1\\1\end{pmatrix}\cdot\begin{pmatrix}1\\1\\1\end{pmatrix}}\begin{pmatrix}1\\1\\1\end{pmatrix} = \begin{pmatrix}-1\\0\\1\end{pmatrix} \end{align*} $$So an orthogonal basis for $W$ is $$ \begin{pmatrix}1\\1\\1\end{pmatrix},\begin{pmatrix}-1\\0\\1\end{pmatrix}. $$ This is in fact _Gram-Schmidt process_ for 2 dimensional subspaces. (b) To compute $\text{proj}_{W}(x)$, we use the orthogonal basis $u_{1},u_{2}$ we just computed. So $$ \begin{align*} \text{proj}_{W}(x)& = \text{proj}_{u_{1}}(x)+\text{proj}_{u_{2}}(x) \\ & = \frac{u_{1} \cdot x}{u_{1}\cdot u_{1}} u_{1}+ \frac{u_{2}\cdot x}{u_{2}\cdot u_{2}}u_{2} \\ &= \frac{6}{3}\begin{pmatrix}1\\1\\1\end{pmatrix}+\frac{-2}{2}\begin{pmatrix}-1\\0\\1\end{pmatrix} = \begin{pmatrix}3\\2\\1\end{pmatrix}.\quad\blacklozenge \end{align*} $$ Before we move on from this problem, notice how $\text{proj}_{W}(x)$ equals $x$ itself in this example. This is because $x$ happens to be in $W$ to begin with! This is a happy accident, but don't expect it to be always this case. Let us record this: > If $x \in W$, then $\text{proj}_{W}(x) = x$. Let us also record the method of producing an orthogonal basis for a 2 dimensional subspace that we just did here: >**Gram-Schmidt for 2-dimensional subspaces.** >Let $W$ be a subspace spanned by a basis set $w_{1},w_{2}$. Then $u_{1},u_{2}$ is an orthogonal basis for $W$ by taking $$ \begin{align*} u_{1} &= w_{1} \\ u_{2} &= w_{2}-\text{proj}_{u_{1}}(w_{2}). \end{align*} $$ One should draw a picture to see why this is the case. The higher dimensional process is analogous, which we will discuss more in details later, I will write it down here for now: Given $w_{1},w_{2},\ldots,w_{k}$ any basis for $W$. Then we can obtain an orthogonal basis $u_{1},u_{2},\ldots,u_{k}$ of $W$ by computing $$ \begin{array}{l} u_{1} &= w_{1} \\ u_{2} &= w_{2} - \text{proj}_{u_1}(w_{2}) \\ u_{3} &= w_{3} - \text{proj}_{u_{1}}(w_{3}) - \text{proj}_{u_{2}}(w_{3}) \\ u_{4} &= w_{4} - \text{proj}_{u_{1}}(w_{4}) - \text{proj}_{u_{2}}(w_{4}) - \text{proj}_{u_{3}}(w_{4}) \\ & \vdots \end{array} $$ ## Properties of orthogonal projection and its matrix. If we take any subspace $W$, then by the construction of $\text{proj}_{W}(x)$ we can see that $\text{proj}_{W}$ is a linear transformation (it is just a sum of a bunch of dot products). So one may be wondering what is the standard matrix for $\text{proj}_{W}$. Here it is: > **Matrix of an orthogonal projection.** > Let $u_{1},u_{2},\ldots,u_{k}$ be an _orthonormal basis_ of $W$, then $$ \text{proj}_{W} =QQ^{T} $$where $Q =\begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}$ is a matrix with orthonormal columns consisting an orthonormal basis of $W$. $\blacktriangleright$ Proof. Since $u_{i}\cdot u_{j}=\delta_{ij}$, as they form an orthonormal basis, we have $$ \begin{align*} \text{proj}_{W}(x) &= \text{proj}_{u_{1}}(x) + \text{proj}_{u_{2}}(x)+\cdots + \text{proj}_{u_{k}}(x) \\ &=(u_{1}\cdot x)u_{1} + (u_{2}\cdot x)u_{2}+\cdots (u_{k}\cdot x)u_{k} \\ &= \begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}\begin{pmatrix}u_{1}\cdot x\\u_{2}\cdot x\\\vdots\\u_{k}\cdot x\end{pmatrix} \\ &= \begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}\begin{pmatrix}u_{1}^{T}x\\u_{2}^{T} x\\\vdots\\u_{k}^T x\end{pmatrix} \\ &= \begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}\begin{pmatrix}\frac{\quad}{} \ u_{1}\frac{\quad}{}\\\frac{\quad}{} \ u_{2}\frac{\quad}{}\\\vdots\\\frac{\quad}{} \ u_{k}\frac{\quad}{}\end{pmatrix} \begin{pmatrix}|\\x\\|\end{pmatrix} \\ &=QQ^{T}x. \end{align*} $$Since for all $x$ we have $\text{proj}_{W}(x) = QQ^{T}x$, we therefore have the standard matrix $\text{proj}_{W}= QQ^{T}$. $\blacksquare$ **Example.** In the above example where $W$ has orthogonal basis $$ \begin{pmatrix}1\\1\\1\end{pmatrix},\begin{pmatrix}-1\\0\\1\end{pmatrix}, $$we can construction the projection matrix for $\text{proj}_{W}$ by computing $QQ^{T}$, but we need an orthonormal basis. If we normalize each of the vectors in the orthogonal basis set, we get $$ \frac{1}{\sqrt{3}}\begin{pmatrix}1\\1\\1\end{pmatrix}, \frac{1}{\sqrt{2}}\begin{pmatrix}-1\\0\\1\end{pmatrix}, $$so we have $\text{proj}_{W}=QQ^{T}$ where $$ Q = \begin{pmatrix} \frac{1}{\sqrt{3}} &-\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{3}} &0\\ \frac{1}{\sqrt{3}}& \frac{1}{\sqrt{2}}\end{pmatrix} $$Hence $$ QQ^{T} = \begin{pmatrix} \frac{5}{6} & \frac{1}{3} & -\frac{1}{6} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ -\frac{1}{6} & \frac{1}{3} & \frac{5}{6} \end{pmatrix} $$And if we compute $\text{proj}_{W}(x)$, we can do $QQ^{T}x$ instead. Check that indeed $$ \text{proj}_{W}\begin{pmatrix}1\\2\\3\end{pmatrix} = QQ^{T} \begin{pmatrix}1\\2\\3\end{pmatrix} = \begin{pmatrix}1\\2\\3\end{pmatrix} $$as we have discovered earlier. $\blacklozenge$ We can go a bit further. Since each $\text{proj}_{u}(x) = (u\cdot x)u = uu^{T}(x)$ when $u$ is an unit vector, we can also write > Orthogonal projection as sum of rank 1 orthogonal projections. > If $u_{1},u_{2},\ldots,u_{k}$ is an orthonormal basis of $W$, then the orthogonal projection $\text{proj}_{W}$ has matrix $$ \text{proj}_{W} = u_{1}u_{1}^{T}+ u_{2}u_{2}^{T}+\cdots u_{k}u_{k}^{T}. $$ In any case, this settles this mystery for an $n\times k$ matrix $Q$ with orthonormal columns: - $Q^{T}Q=I_{k\times k}$ - $QQ^{T}=\text{proj}_{W}$, where $W$ is the columnspace of $Q$. Intuitively orthogonal projections are _idempotent_, that performing twice orthogonal projection is just the same as performing it once: > For $W$ a subspace, $(\text{proj}_{W})^{2}=\text{proj}_{W}$ Indeed, if we compute $$ (QQ^{T})^{2}= QQ^{T} QQ^{T}=QIQ^{T}=QQ^{T}. $$ And, we have > For $W$ a subspace of $\mathbb R^{n}$, $$ \begin{array}{l} \text{the range of $\text{proj}_W$ is $W$} \\ \text{the kernel of $\text{proj}_W$ is $W^\perp$}. \end{array} $$ Hence by rank-nullity theorem, we have the corollary that > For $W$ a subspace of $\mathbb R^{n}$, $$ \dim W + \dim W^{\perp} = n. $$ ## Best approximation theorem. This vector $\text{proj}_{W}(x)$, the orthogonal projection of a vector $x$ onto $W$, is a vector in $W$ that is closest to $x$ in distance. That is, it is a vector that best approximates $x$, if we are constrained in $W$. To make the more precise, > **Best approximation theorem.** > Let $W$ be a subspace in $\mathbb R^{n}$, and let $x$ be any vector in $\mathbb R^{n}$. Then for any vector $w \in W$, we always have $$ \Vert \text{proj}_{W}(x) - x\Vert \le \Vert w - x\Vert. $$ > In other words, $\text{proj}_{W}(x)$ is the vector in $W$ closest to $x$. $\blacktriangleright$ Proof. First we have $x = x_{||} + x_{\perp}=\text{proj}_{W}(x) + x_{\perp}$. Now we compute $$ \begin{align*} \Vert w - x\Vert^{2} &= \Vert \underbrace{ w - \text{proj}_{W}(x)}_{\in W}-\underbrace{x_{\perp}}_{\in W^{\perp}}\Vert^{2} \\ &=\Vert w - \text{proj}_{W}(x) \Vert^{2} + \Vert x_{\perp}\Vert^{2}, \text{by Pythagorean theorem} \\ &\ge \Vert x_{\perp} \Vert^{2} \\ \implies \Vert w- x\Vert^{2} &\ge\Vert x-\text{proj}_{W}(x)\Vert^{2} \end{align*} $$ Hence this shows $\Vert\text{proj}_{W}(x)-x\Vert \le \Vert w-x\Vert$. $\blacksquare$